Characterizing polypeptides through cleavage and mass spectrometry

ABSTRACT

Provided is a method for characterizing a polypeptide or a population of polypeptides, which method comprises: (a) contacting a sample comprising one or more polypeptides with a first cleavage agent to generate polypeptide fragments; (b) isolating one or more polypeptide fragments, each fragment comprising the N-terminus or the C-terminus of the polypeptide from which it was fragmented; (c) identifying the isolated fragments by mass spectrometry; (d) repeating steps (a)-(c) on the sample using a second cleavage agent that cleaves at a difference site from the first cleavage agent; and (e) characterizing the one or more polypeptides in the sample from the fragments identified in steps (c) and (d).

This application is a 371 of PCT/GB99/03258 filed Oct. 1, 1999, whichclaims priority to UK application 9821393-7, filed Oct. 1, 1998.

This invention relates to a method for characterising polypeptides usingmass spectrometry to identify the terminal fragments of thepolypeptides. The method involves cleaving the polypeptides andisolating a single terminal peptide from each protein in a population.This invention further relates to the use of the above methods inmethods of determining the expression of proteins in a tissue, celltype, or sub-cellular compartment or in analysing large proteincomplexes.

Techniques for profiling proteins, that is to say cataloguing theidentities and quantities of proteins in a tissue, are not welldeveloped in terms of automation or high throughput. The classicalmethod of profiling a population of proteins is by two-dimensionalelectrophoresis (R. A. Van Bogelen., E. R, Olson, “Application oftwo-dimensional protein gels in biotechnology.”, Biotechnol. Annu. Rev.,1:69-103, 1995). In this method a protein sample extracted from abiological sample is separated on a narrow gel strip. This firstseparation usually separates proteins on the basis of their iso-electricpoint. The entire gel strip is then laid against one edge of arectangular gel. The separated proteins in the strip are thenelectrophoretically separated in the second gel on the basis of theirsize. This technology is slow and very difficult to automate. It is alsorelatively insensitive in its simplest incarnations.

A number of improvements have been made to increase resolution ofproteins by 2-D gel electrophoresis and to improve the sensitivity ofthe system. One method to improve the sensitivity of 2-D gelelectrophoresis and its resolution is to analyse the protein in specificspots on the gel by mass spectrometry (Jungblut P. Thiede B. “Proteinidentification from 2-DE gels by MALDI mass spectrometry.” MassSpectrom. Rev. 16:145-162, 1997). One such method is in-gel trypticdigestion followed by analysis of the tryptic fragments by massspectrometry to generate a peptide mass fingerprint. If sequenceinformation is required, tandem mass spectrometry analysis can beperformed.

More recently attempts have been made to exploit mass spectrometry toanalyse whole proteins that have been fractionated by liquidchromatography or capillary electrophoresis (Dolnik V. “Capillary zoneelectrophoresis of proteins.”, Electrophoresis 18, pp 2353-2361, 1997).In-line systems exploiting capillary electrophoresis mass spectrometryhave been tested. The analysis of whole proteins by mass spectrometry,however, suffers from a number of difficulties. The first difficulty isthe analysis of the complex mass spectra resulting from multipleionisation states accessible by individual proteins. The second majordisadvantage is that the mass resolution of mass spectrometers is atpresent quite poor for high molecular weight species, i.e. for ions thatare greater than about 4 kilodaltons in mass. Thus, resolving proteinsthat are close in mass is difficult. A third disadvantage is thatfurther analysis of whole proteins by tandem mass spectrometry isdifficult as the fragmentation patterns for whole proteins are extremelycomplex and difficult to interpret.

WO 98/32876 discloses methods for profiling a population of proteins byisolating a single peptide from each protein in the population. Themethod disclosed in this application comprises the following steps:

-   -   1. Capturing a population of proteins onto a solid phase support        by one terminus of each protein in the population.

2. Cleaving the captured proteins with a sequence specific cleavageagent.

3. Washing away peptides generated by the cleavage agent not retained onthe solid phase support.

4. Releasing the terminal peptides retained on the solid phase support.

5. Analysing the released terminal peptides preferably identifying andquantifying each peptide in the mixture. The analysis is preferablyperformed by mass spectrometry.

In this application, the C-terminus is indicated to be more preferableas the terminus by which to capture a population of proteins, since theN-terminus is often blocked. In order to capture a population ofproteins by the C-terminus, the C-terminal carboxyl group must bedistinguished from other reactive groups on a protein and must bereacted specifically with a reagent that can effect immobilisation. Inmany C-terminal sequencing chemistries the C-terminal carboxyl group isactivated to promote formation of an oxazolone group at the C-terminus.During the activation of the C-terminal carboxyl, side chain carboxylsare also activated but these cannot form an oxazolone group. It has beenreported that the C-terminal oxazolone is less reactive to nucleophilesunder basic conditions than the activated side-chain carboxyls, offeringa method of selectively capping the side chain carboxyl groups (V. L.Boyd et al., Methods in Protein Structure Analysis: 109-118, PlenumPress, Edited M. Z. Atassi and E. Appella, 1995). Other more reactiveside chains can be capped prior to the activation of the carboxyls usinga variety of conventional reagents. In this way all reactive side chainscan be capped and the C-terminus can be specifically labelled.

EP-A-0 594 164 describes a method of isolating a C-terminal peptide froma protein in a method to allow sequencing of the C-terminal peptideusing N-terminal sequencing reagents. In this method the protein ofinterest is digested with an endoprotease which cleaves at theC-terminal side of lysine residues. The resultant peptides are reactedwith DITC polystyrene which reacts with all free amino groups.N-terminal amino groups that have reacted with the DITC polystyrene canbe cleaved with trifluoroacetic acid (TFA) thus releasing the N-terminusof all peptides. The epsilon amino group of lysine is not cleaved,however, and all non-terminal peptide are thus retained on the supportand only C-terminal peptides are released. According to this patent theC-terminal peptides are recovered for micro-sequencing.

However, none of the above methods is capable of readily identifying allof the proteins present in a population, and in fact many of thesemethods cannot uniquely identify all of the proteins present at all.

Accordingly, it is an object of the present invention to overcome theproblems associated with the prior art, and to provide a method forcharacterising polypeptides which is capable of uniquely identifying agreater proportion of proteins present in a sample, than is possibleusing prior art methods. It is thus an object of the present inventionto provide improved methods for isolating a single terminal peptide fromeach protein in a mixture, complex or population (prepared in anarbitrary fashion). A further object of this invention is to providesuch methods that are amenable to automation. It is also an object ofthis invention to provide methods to improve the sensitivity of the massspectrometry analysis of terminal peptides generated by the methods ofthis invention.

Accordingly the present invention provides a method for characterising apolypeptide or a population of polypeptides, which method comprises:

-   -   (a) contacting a sample comprising one or more polypeptides with        a first cleavage agent to generate polypeptide fragments;    -   (b) isolating one or more polypeptide fragments, each fragment        comprising the N-terminus or the C-terminus of the polypeptide        from which it was fragmented;    -   (c) identifying the isolated fragments by mass spectrometry;    -   (d) repeating steps (a)-(c) on the sample using a second        cleavage agent that cleaves at a different site from the first        cleavage agent; and    -   (e) characterising the one or more polypeptides in the sample        from the fragments identified in steps (c) and (d).

In the context of the present invention a polypeptide includes anypeptide comprising two or more amino acids, and includes any protein. Inthe present context, the sample used in the repeat step (d) is the samesample as that used in step (a). The “same sample” may mean (forexample) one of a plurality of portions separated from one originalsample, there being one portion for each repetition of the steps(a)-(c).

The steps (a)-(c) can be repeated any number of times, e.g. once, twiceor more times. For each repetition a cleavage agent is used whichcleaves at a different site than the previous cleavage agents used. Thisgives a different set of terminal fragments for each repetition, eventhough the same sample is being treated. If two or more proteins cannotbe distinguished by means of their terminal fragments produced bycleavage at a particular site (e.g. if two or more proteins haveterminal fragments sharing the same or indistinguishable mass) thencleavage at a different site may (for the same two or more proteins)produce terminal fragments of different masses. Thus the resolution ofthe method is enhanced by repetition using a plurality of differentcleavage agents.

The cleavage agents used in the present invention are not especiallylimited. Preferred cleavage agents include an endopeptidase or achemical cleavage agent. More preferably the cleavage agent employedcomprises Lys-C endopeptidase, a thiocyanate compound, cyanogen bromide,BNPS-skatole, trypsin, chymotrypsin, and/or thrombin.

The present invention also provides a method for characterising apolypeptide or a population of polypeptides, which method comprises:

-   -   (f) contacting a sample comprising one or more polypeptides with        a first capping agent in a first capping step to introduce        capping groups on one or more reactive side chains of the        polypeptides;    -   (g) contacting the sample with a cleavage agent to generate        polypeptide fragments;    -   (h) isolating one or more polypeptide fragments, each fragment        comprising the N-terminus or the C-terminus of the polypeptide        from which it was fragmented;    -   (j) identifying the isolated fragments by mass spectrometry;    -   (k) repeating steps (f)-(j) on the sample using a second capping        agent that introduces capping groups at the same side chains as        the first capping step, but uses capping groups having different        mass than the capping groups used in the first capping step; and    -   (l) characterising the one or more polypeptides in the sample        from the fragments identified in steps (j) and (k).

The method may use the same cleavage agent or different cleavage agentsfor each repetition, as long as capping groups having different massesare employed; for each repetition. The steps (f)-(j) can be repeated anynumber of times, e.g. once, twice or more times. For each repetition acapping group is used which has a different mass than the correspondingprevious capping group that was used. This gives a different set ofterminal fragments for each repetition, even though the same sample isbeing treated with the same cleavage agent. If two or more proteinscannot be distinguished from their terminal fragments capped with aparticular capping group or groups (e.g. if two or more proteins haveterminal fragments sharing the same or indistinguishable mass) then useof one or more capping groups having a different mass to thecorresponding groups used in the previous capping step may (for the sametwo or more proteins) produce terminal fragments of different masses.Thus the resolution of the method is enhanced by repetition using aplurality of capping groups, each having a different mass.

In the above-described methods, when the side chains of the proteins inthe population are to be capped, the side chains to be capped maycomprise one or more of the following:

-   -   the NH₂ side chain in arginine;    -   the NH₂ side chain in asparagine;    -   the NH₂ side chain in glutamine;    -   the NH₂ side chain in lysine;    -   the COOH side chain in aspartic acid;    -   the COOH side chain in glutamic acid;    -   the OH side chain in serine;    -   the OH side chain in threonine;    -   the OH side chain in thyroxine;    -   the OH side chain in tyrosine; and    -   the SH side chain in cysteine.

The capping agents used in the present invention are not especiallylimited. Preferred capping agents include iodoacetate compounds,isocyanate compounds (e.g. phenyl isocyanate), silyl compounds (e.g.trimethylchlorosilane), anhydride compounds (e.g. acetic anhydride andtrimethylacetic anhydride), vinylsulphone compounds (e.g. phenylvinylsulphone and methyl vinylsulphone) and vinyl pyridine derivatives(e.g. 4-vinyl pyridine). The mass of the capping group can be altered bysubstitution. Such substitution is not particularly limited, providedthat the capping reaction is still able to proceed. Substitution withdeuterium or a halogen such as an iodine group is preferred.

Thus, in one preferred aspect, this invention provides a method ofgenerating a protein expression profile with improved resolution bycombining two or more protein expression profiles where the profileshave been generated from the same protein mixture but which have beentreated with two or more different sequence specific cleavage agents. Adifferent sequence specific cleavage agent being used in the generationof each individual profile. In this context resolution refers to theproportion of proteins in a population that can be identified uniquelyfrom a database on the basis of their terminal peptide mass alone.

In a further preferred aspect this invention provides a method ofgenerating a protein expression profile with improved resolution bycombining two or more expression profiles where the profiles have beengenerated from the same protein mixture and have been cleaved with thesame sequence specific cleavage reagent but where specific side chains,such as amino acid side chains, have been capped with two or morecapping reagents with different masses. A different set of cappingagents is used in the generation of each individual profile.

More preferred aspects of the invention will now be discussed in detail.In a first preferred aspect, this invention provides a method ofgenerating a population of terminal peptides comprising the steps of:

-   -   1. Immobilising a population of proteins onto a solid support.    -   2. Contacting the immobilised population of proteins with a        reagent which reacts with the alpha-amino group at the        N-terminus of the proteins and with any lysine epsilon-amino        groups. This reagent may optionally also react with any serine        and threonine side chains in order to cap them. In this context        capping means reacting the side-chains with a reagent which will        render the side-chains unreactive to any other reagents used in        the subsequent steps of this method. If the reagent does not cap        side chains other than amine groups additional capping agents        may be applied to cap other reactive side chains so that        substantially all reactive side-chains are capped.    -   3. Contacting the protein population with a reagent that will        ‘activate’ the free carboxyl groups of the proteins. The        activation reagent should preferably promote the formation of an        oxazolone group at the C-terminal activated carboxyls of the        proteins in the population.    -   4. Contacting the resultant derivitised proteins with a        nucleophile under basic conditions to cap the activated side        chain carboxyl derivatives. The C-terminal oxazolinone is less        reactive to nucleophiles than the activated side chain        carboxyls. In this way all reactive side chain functionalities        are capped. Preferably the nucleophile is added as a thiocyanate        salt.    -   5. Contacting the reaction mixture with an appropriate acid,        preferably TFA, which promotes reaction of the C-terminal        oxazolone group with thiocyanate to give a C-terminal        thiohydantoin derivative.    -   6. Contacting the proteins with a cleavage agent to cleave the        C-terminal thiohydantoin from the derivitised proteins to expose        the carboxyl of the penultimate amino acid in each protein.    -   7. Contacting the exposed penultimate carboxyl with an        appropriate reagent to permit solid phase capture of proteins by        the C-terminus.    -   8. Contacting the C-terminally modified proteins with a sequence        specific cleavage agent.    -   9. Capturing the C-terminal peptides onto a solid phase support        and washing away the non-terminal peptides.    -   10. Optionally reacting the terminal amine group of the captured        peptides with a mass spectrometry sensitisation reagent.    -   11. Releasing the captured C-terminal peptides from the solid        phase support.    -   12. Recovering the released peptides.    -   13. Analysing the C-terminal peptides by mass spectrometry.

Any one of the above preferred steps, or any combination of the abovepreferred steps can be utlised in the general methods of the inventionas already described above.

In a second preferred aspect this invention provides a method ofgenerating a population of terminal peptides comprising the steps of:

-   -   1. Immobilising a population of proteins onto a solid phase        support.    -   2. Contacting the immobilised population of proteins with a        reagent which reacts with the alpha-amino group at the        N-terminus of the proteins and with any lysine epsilon-amino        groups. This reagent may optionally also react with any serene        and threonine side chains in order to cap them. In this context        capping means reacting the side-chains with a reagent which will        render the side-chains unreactive to any other reagents used in        the subsequent steps of this method. If the reagent does not cap        side chains other than amine groups additional capping agents        may be applied to cap other reactive side chains so that        substantially all reactive side-chains are capped.    -   3. Contacting the protein population with a reagent that will        ‘activate’ the free carboxyl groups of the proteins. The        activation reagent should preferably promote the formation of an        oxazolone group at the C-terminal activated carboxyls of the        proteins in the population.    -   4. Contacting the resultant derivitised proteins with a        nucleophile under basic conditions to cap the activated side        chain carboxyl derivatives. The C-terminal oxazolinone is less        reactive to nucleophiles than the activated side chain        carboxyls. In this way all reactive side chain functionalities        are capped. Preferably the nucleophile is added as a salt with        an inert anion.    -   5. Hydrolysing, the C-terminal oxazolone to regenerate the        C-terminal carboxyl species.,    -   6. Contacting the proteins with an activation agent to activate        the C-terminal carboxyl group    -   7. Contacting the activated terminal carboxyl group with an        appropriate reagent to permit solid phase capture of proteins by        the C-terminus.    -   8. Contacting the C-terminally modified proteins with a sequence        specific cleavage agent.    -   9. Capturing the C-terminal peptides onto a solid phase support        and washing away the non-terminal peptides.    -   10. Optionally reacting the terminal amine group of the captured        peptides with a mass spectrometry sensitisation reagent.    -   11. Releasing the captured C-terminal peptides from the solid        phase support.    -   12. Recovering the released peptides.    -   13. Analysing the C-terminal peptides by mass spectrometry.

Any one of the above preferred steps, or any combination of the abovepreferred steps can be utilised in the general methods of the inventionas already described above.

In a third preferred aspect, this invention provides a method ofgenerating a population of terminal peptides from a population ofproteins comprising the steps of:

-   -   1. Digesting a population of proteins completely with a Lys-C        specific cleavage enzyme, i.e. a reagent that cuts at the        peptide bond immediately agent to a lysine residue on the        C-terminal side of that residue.    -   2. Contacting the resultant peptides with an activated solid        support that will react with free amino groups.    -   3. Contacting the captured peptides with a reagent that which        cleaves at the alpha amino groups of each peptide on the        support. All peptides that are not C-terminal will have a lysine        residue covalently linking them to the solid support. Thus free        C-terminal peptides are selectively released.    -   4. Recovering the released peptides.    -   5. Optionally contacting the released peptide with reagents to        cap reactive side chains.    -   6. Optionally reacting the terminal amine group of the captured        peptides with a mass spectrometry sensitisation reagent.    -   7. Analysing the peptides by mass spectrometry.

Any one of the above preferred steps, or any combination of the abovepreferred steps can be utilised in the general methods of the inventionas already described above.

In order that only the fragments containing lysine residues remain boundto the solid support via two amino groups, (so that all fragments exceptthose containing lysine groups are released from the solid supportduring the releasing step) the fragments are preferably attached to thesolid phase at a pH at which the side chain NH₂ of lysine is notprotonated. Thus, this step is preferably carried out at a pH of from11-11.5.

Generation of a population of C-terminal peptides

The use of C-terminal sequencing reagents

Various preferred embodiments of the above aspects of this invention arediscussed below.

In the first step of the first and second preferred aspects of thisinvention, a population of proteins is immobilised onto a solid phasesupport, preferably non-covalently. Zitex (porous Teflon from NortonPerformance Plastics, Wayne, N.J.) membranes can be used to effectnon-covalent immobilisation of proteins on a solid phase support (Baileyet al., “Automated carboxy-terminal sequence analysis of peptides andproteins using diphenyl phosphoroisothio-cyanatidate”, Protein Science1:1622-1633, 1992; Bailey et al., Anal. Biochem. 212:366374, 1993.)Polyvinylidenedifluoride membranes (Millipore) can also be used toimmobilise proteins.

Step 2 of the first and second preferred aspects of this inventioninvolves capping of the reactive side chains of a population ofproteins. It is well known in the art that the reactive side-chainfunctionalities can be selectively capped. Reactive side-chains includelysine, serine, threonine, tyrosine and cysteine. Cysteine is oftencross-linked with itself to form disulphide bridges. For the purposes ofthis invention it is preferred that these bridges are broken. This canbe effected by reducing the disulphide bridge to a pair of thiols withmercaptethanol. Thiols can be selectively capped by iodacetate (Aldrich)under mildly basic conditions which promote the formation of a thiolateion (Mol. Microbiol 5:2293, 1991). An appropriate mild base would be acarbonate. In other embodiments the population of proteins may betreated with an isocyanate compound. Isocyanates will react almostexclusively with the alpha-amino group at the N-terminus of the proteinsand with any lysine epsilon-amino groups, i.e. with primary amines undermild conditions, i.e. at room temperature in a neutral solvent to give aurea derivative. These reagents cart also be made to react with anyhydroxyl bearing side-chains, such as serine, threonine and tyrosineside chains, in the presence of an appropriate catalyst such as pyridineor a tin compound, such as dibutylstanyl laurate, to give a urethanederivative. In an alternative embodiment the population of proteins canbe treated with a silyl compound such as trimethylchlorosilane (Sigma).These compounds react readily with most reactive functional groups.Amine derivatives are not stable under aqueous conditions and so can behydrolysed back to the free amine if that is desired. The above examplesare intended to illustrate methods of capping reactive side-chainfunctionalities and are not intended to limit the scope of thisinvention. A wide variety of protective groups are known in the art anit is envisaged that a large proportion of these could be used tocomplete the steps of this invention.

In step 3 of the first and second preferred aspects of this inventionthe carboxyl side chains are then ‘activated’. Acetic anhydride has beenwidely used for this purpose, which generates mixed anhydrides atcarboxyl groups. In some embodiments, step 2 of first and secondpreferred aspects of this invention may be combined with step 3.Activation of side chain carboxyls with an anhydride compound alsoresults in capping of reactive side chains, such as lysine, serine,threonine and tyrosine. In embodiments where step 2 and 3 are combined,it is preferred that the activation reagent chosen is stable as acapping agent for the reactive side chains, as some anhydridederivatives are not stable under basic or acidic conditions.Trimethylacetic anhydride is more stable under these conditions throughsteric effects, for example. An alternative, more preferred, activationreagent is Woodwards Reagent K (N-Ethyl-5-phenylisoxazolium-3-sulphonateavailable from Sigma). In another preferred embodiment of this aspect ofthe invention, activation is achieved by treatment with a reactivephosphate such as tetraphenyl pyrophosphate (Aldrich) or diphenylphosphochloridate. According to U.S. Pat. No. 5,665,603, the activationstep required in peptide sequencing methods to generate a carboxylderivative at the C-terminus of a peptide or poly-peptide which canreact to form a thiohydantoin can be effected more efficiently undermilder conditions with acyl phosphate compounds, such as diphenylphosphochloridate or tetraphenyl pyrophosphate (Aldrich). The C-terminalcarboxyl is reacted with the reactive phosphate in the presence of abase to deprotonate the carboxyl group. Preferred bases aretri-ethylamine, diisopropylethylamine or pyridines, i.e. a base whichwon't react with the resultant acyl phosphates. The phosphorylationreaction to activate the C-terminal carboxyl preferably uses equimolarquantities of reactive phosphate and base. These two reagents are addedin large excess to a poly-peptide in a polar, aprotic solvent, e.g.acetonitrile (ACN), dimethylformamide or an ether, preferably ACN. Thereaction is typically complete in 5 to 10 minutes and usually less than30 minutes at room temperature. Temperature can vary: the reactions areperformed at 55° C. in an automated sequencer in line with otherreactions taking place. The C-terminal activated derivatives of carboxylgroups will spontaneously cyclise to form an oxazolone intermediatewhilst the side chain carboxyls remain activated.

The activated side-chain carboxyl derivatives apparently react withnucleophiles under basic conditions whilst the C-terminal oxazolonegroup is much less reactive to nucleophiles V. L. Boyd et al., Methodsin Protein Structure Analysis: 109-118, Plenum Press, Edited M. Z.Atassi and E. Appella, 1995). Hence, in step four of the first andsecond preferred aspects of this invention the protein population iscontacted with a nucleophile under basic conditions. A preferrednucleophile is a primary amine compound. If amine nucleophiles are usedthe side chains can be amidated. Preferred amine nucleophiles arepiperidine, methylamine, ethylamine or other amine, added with anappropriate base. To perform the amidation reaction equimolar quantitiesof base and nucleophile are added in a polar aprotic solvent, preferablyacetonitrile, with the reagents in a large_excess over the mixture ofproteins. Ammonia generates asparagine from activated aspartic acidresidues and glutamine from glutamic acid residues. This may reduce theinformation in the peptide's mass and ammonia may not be a preferrednucleophile. In an alternative embodiment an alcohol can be used toesterify the activated side-chain carboxyl derivatives under acidicconditions. An appropriate alcohol, e.g. methanol, can be added in anappropriate, anhydrous acidic solvent, e.g. TFA.

In the first preferred aspect of this invention the nucleophile may beadded as a thiocyanate salt. The oxazolone reacts under acidicconditions with isothiocyanate and so after an initial period underbasic conditions to promote amidation the reaction is acidified withtrifluoracetic acid to promote C-terminal thiohydantoin formation.

In the second preferred aspect of this invention, the amine nucleophilemay be added as a salt with an inert group, e.g. a carbonate. Preferablya volatile salt is used to facilitate removal of unused reagent. In thisway the oxazolone can be prevented from reacting so that the oxazolonecan be hydrolysed to regenerate a free carboxyl at the C-terminus. In afurther embodiment the amine nucleophile can be added as a salt with abiotinylation agent with an appropriate functionality to react with theoxazolone.

In the first aspect of this invention where the nucleophile is added asa thiocyanate salt, the reaction mixture is then acidified with anappropriate acid, preferably TFA which promotes reaction of theC-terminal oxazolone group with thiocyanate to give a C-terminalthiohydantoin derivative. The C-terminal thiohydantoin is then cleavedfrom the derivitised proteins to expose the carboxyl of the penultimateamino acid in each protein. Cleavage of the thiohydantoin from theC-terminal can be effected by a variety of methods including hydrolysisby acid or base. More preferably cleavage is effected by a reagent suchas sodium trimethylsilanolate in an alcoholic solvent (J. M. Bailey etal., Protein Science 1:68-80, 1992).

In step 7 of the first and second preferred aspects of this inventionthe exposed terminal carboxyl may then be reacted with an appropriatereagent to permit solid phase capture of proteins by the C-terminus. Areagent such as biotinamidocaproyl hydrazide is reactive with freecarboxylic acids. A reagent such as 5-(biotimamido) pentylamine can beused allowing capture by avidin. Prior to reaction with thisbiotinylation reagent, free carboxyl termini must be activated. Avariety of activation agents can be used, but should preferably notpromote the formation of an oxazolone. An anhydride compound that issterically hindered might be appropriate. In an alternative embodimentthe activated terminal carboxyl can be reacted with a solid phasesupport functionalised with an appropriate group to react with theactivated carboxyl. A free amine group would be appropriate. To permitselective release the carboxyl reactive functionality should be linkedto the support by a cleavable linker. A variety of cleavable linkers areknown in the art. Photocleavable linkers are well known in the art(Lloyd-Williams et at., Tetrahedron 49:11065-11133, 1993). There arealso numerous chemically cleavable linkers, e.g. thioesters may becleaved by hydroxyl-armine.

In step 8 of the first and second preferred aspects of this inventionthe C-terminally modified proteins are then treated with a sequencespecific cleavage agent. Preferred cleavage agents are chemical reagentswhich are volatile permitting easy removal of unreacted reagent.Appropriate chemical cleavage reagents include cyanogen bromide whichcleaves at methionine residues and BNPS-skatole which cleaves attryptophan residues (D. L. Crimmins et al., Anal. Biochem. 187:27-38,1990). In other embodiments sequence specific endoproteinases such astrypsin, chymotrypsin, thrombin or other enzymes may be used.

In step 9 of the first and second preferred aspects of this inventionthe terminally modified peptides are captured on a solid phase support.In embodiments where the terminal carboxyl is biotinylated theC-terminal peptides may then be captured onto a solid phase supportderivitised with streptavidin. The non-terminal peptides can then bewashed away.

In step 10 of the first and second preferred aspects of this inventionthe captured C-terminal peptides are optionally reacted with a massspectrometry sensitisation reagent. The ion detectors of a massspectrometer are extremely sensitive, and can detect the arrival ofsingle ions. Thus detection of ions is not the limiting factor thatdetermines the sensitivity of a mass spectrometer. Generally, the mostlimiting factor is the ionisation of the analyte. In a typicalelectrospray source or FAB source, only one in a thousand molecules ofanalyte will actually ionise and be detected. For the purposes ofimproving this process, it is desirable to introduce a sensitisationcompound into the peptides for detection that will pre-ionise thepeptide. Preferred compounds include quaternary ammonium ions or metalion chelation agents. An exemplary compound might be4-(3-pyridylmethylaminocarboxypropyl)phenyl isocyanate (E. J. Bures etal., “Synthesis and evaluation of a panel of novel reagents for stepwisedegradation of polypeptides”, Methods in Protein Structure Analysis,Plenum Press, New York, 1995 disclose the use of the isothiocyanate formof this compound as a mass spectrometry sensitiser). The introduction ofa sensitisation group not only improves sensitivity but also reduces therisk of competition for ionisation by analyte molecules. This means thatall peptides should be more evenly represented in the mass spectrum.

In step 11 of the first and second preferred aspects of this inventionthe captured C-terminal peptides are then released from the solid phasesupport. In embodiments where the terminal carboxyl is biotinylated, theavidin captured peptides can be released by treatment with acid.Preferably TFA is used to facilitate recovery of the released peptidesas this is volatile and can be readily evaporated to permit recovery ofthe peptides.

In step 12 of the first and second preferred aspects of this inventionhe released peptides are recovered. In step 13 of the third and fourthaspects of this invention the peptides are analysed by massspectrometry. In one embodiment the peptides are embedded in a MALDImatrix, such as cinammic acid, on an appropriate support and areanalysed by MALDI mass spectrometry to determine a peptide massfingerprint for the population of C-terminal peptides. MALDI is apreferred analysis technique as this ionisation technique favours theformation of [M+H]+ions. Thus there is usually only one major peak inthe mass spectrum for each peptide.

In a further embodiment the recovered peptides may be dissolved in anappropriate solvent and can be analysed by a spraying inlet system suchas electrospray ionisation mass spectrometry. Similarly Fast AtomBombardment (FAB) and related interfaces may be used. This may includein-line liquid chromatography separation of peptides prior to analysisby mass spectrometry. Capillary electrophoresis may be used, or HPLC orcapillary iso-electric focusing. Dynamic FAB is a particularly preferredmethod as this method of ionisation promotes the generation of ions inthe form in which they exist in the matrix used to introduce them intothe mass spectrometer. This means that the ions present in the massspectrum ought to be the same as they are in solution if a liquid matrixis used. Since in the present invention reactive side chains may becapped and a mass spectrometry sensitisation agent may be introducedinto the peptides to be analysed, it is possible to ensure that allpeptides have only a single charge in solution which is represented as asingle mass peak in the final mass spectrum of a peptide population.Tandem mass spectrometers can be coupled to a spraying interface or to aFAB interface. Tandem mass spectrometry permits sequence information tobe determined for a peptide and permits identification of covalentmodifications of the protein.

The use of DITC glass

Various embodiments of the third preferred aspect of this invention arediscussed here.

In step 1 of the third preferred aspect of this invention a populationof proteins is completely digested with a Lys-C specific cleavageenzyme, e.g. endoproteinase Lys-C from Lysobacter enzymogenes(Boehringer Mannheim).

In step 2 of the third preferred aspect of this invention the resultantpeptide are contacted with a solid support which reacts with amine. Inone embodiment the peptide population is reacted with Isothiocyanatoglass (DITC glass, Sigma) in the presence of a base. This captures allpeptides to the support through any free amino groups.

In step 3 of the third preferred aspect of this invention the capturedpeptides are contacted with a reagent that which cleaves at the alphaamino groups of each peptide on the support. In embodiments where DITCglass is used the peptides are treated with an appropriate volatile acid(TFA) which cleaves the N-terminal amino acid from each peptide on thesupport. All peptides that are not C-terminal will have a lysine residuecovalently linking them to the solid support. Thus free C-terminalpeptides are selectively released.

In step 4 of the third preferred aspect of this invention the releasedpeptides can be recovered from the TFA by evaporating the TFA solventused to cleave the peptides from the support.

In step 5 of the third preferred aspect of this invention the releasedpeptides may be contacted with reagents to cap reactive side chains.Appropriate reagents include those discussed above.

In step 6 of the third preferred aspect of this invention the peptidesare optionally reacted with a mass spectrometry sensitisation reagent.See the discussion above regarding step 10 of the first and secondpreferred aspects of this invention.

In step 7 of the third preferred aspect of this invention the peptidesare analysed by mass spectrometry. In one embodiment the terminalpeptides are embedded in a MALDI matrix such as cinammic acid and areanalysed by MALDI mass spectrometry to determine a peptide massfingerprint for the population of C-terminal peptides.

As in the other preferred aspects of this invention, alternative methodsof analysing peptides by mass spectrometry can be used. Thus inalternative embodiments the recovered peptides may be dissolved in anappropriate solvent and can be analysed by a spraying inlet system suchas electrospray ionisation mass spectrometry. Similarly Fast AtomBombardment and related interfaces may be used. This may also includein-line liquid chromatography separation prior to mass spectrometry.Tandem mass spectrometers can be coupled to a spraying interface or to aFAB interface. Tandem mass spectrometry permits sequence information tobe determined for a peptide and permits identification of covalentmodifications of the protein.

EXAMPLES

A series of short computer programs (written in PERL) were employed toanalyse the SWISSPROT public domain database of protein sequences todetermine, for a given organism, what proportion of proteins could beidentified uniquely on the basis of their terminal peptide masses alone.Proteins were extracted from the SWISSPROT database release 35. Analyseswere performed on data from the H. influenzae genome. The genome of thisorganism has been completely sequenced and nearly all predicted openreading frames have been identified. There were 1882 H. influenzaeproteins present in this release of the database. Profiles with cleavageat methionine by Cyanogen Bromide and cleavage at tryptophan weretested. It was assumed that the terminal amino acid is not cleaved bythe profiling chemistry.

In all of the examples it has been assumed that the mass spectrometerhas a mass resolution of 5000. This means that the instrument canresolve a difference in mass of 1 part in 5000, or in other words amolecule of mass 5000 daltons can be resolved from a molecule of mass4999 daltons. It is also assumed that each peptide only gives rise to asingle molecular ion peak.

Example 1 Cleavage with Lys-C and SCN

In this example the 1882H. influenzae proteins are cleaved with theendoproteinase Lys-C which cuts proteins at the C-terminal side of aLysine residue. The peptides are then reacted with a solid phase supportderivitised with phenylisothiocyanate moieties which readily react withprimary amines generated at the cleavage sites and present on Lysineside chains as described above. All peptides have at least 1 lysineresidue except the C-terminal peptide which has none. All peptides arecaptured onto the solid support by their N-terminal primary amines. Allnon C-terminal peptides are also captured by their lysine side chains aswell. Phenylisocyanate can be induced to cleave the N-terminal residueof all peptides. All lysine residue carrying peptides will still remainattached to the support as these isothiocyanate derivatives cannotcleave. Thus C-terminal peptides are released from the solid phasesupport. The C-terminal peptides can then be isolated. They may bemodified further if desired for reasons discussed above. After isolationof the C-terminal peptides and any additional side chain modificationsthe peptides may be analysed by mass spectrometry.

It is assumed that no other side chains are capped and the unmodifiedpeptides are analysed directly.

Results

-   1882 protein records analysed.-   7 proteins did not have a cleavage site for the reagent used.-   870 peptide mass(es) shared by 1 protein(s)-   155 peptide mass(es) shared by 2 protein(s)-   43 peptide mass(es) shared by 3 protein(s)-   19 peptide mass(es) shared by 4 protein(s)-   6 peptide mass(es) shared by 5 protein(s)-   9 peptide mass(es) shared by 6 protein(s)-   2 peptide mass(es) shared by 7 protein(s)-   3 peptide mass(es) shared by 8 protein(s)-   1 peptide mass(es) shared by 9 protein(s)-   1 peptide mass(es) shared by 10 protein(s)-   2 peptide mass(es) shared by 11 protein(s)-   2 peptide mass(es) shared by 12 protein(s)-   1 peptide mass(es) shared by 13 protein(s)-   1 peptide mass(es) shared by 17 protein(s)-   1 peptide mass(es) shared by 19 protein(s)-   1 peptide mass(es) shared by 23 protein(s)-   1 peptide mass(es) shared by 33 protein(s)-   1 peptide mass(es) shared by 205 protein(s)

This peptide mass fingerprint identifies 46.2% of proteins uniquely onthe basis of their terminal peptide mass alone.

Example 2 Capping hydroxy side-chains

In this example the reactive hydroxyl carrying side chains of serine,threonine and tyrosine are capped with a silyl protecting groupgenerating for example trimethylsilyl derivatives at these side chains.The side chain carboxyl groups are activated with an appropriatereagent, such as tetraphenylpyrophosphate and these are then convertedto piperidine amides by reaction with piperidine in the presence of abase as described above.

Results

-   1882 protein records analysed.-   7 proteins did not have a cleavage site for the reagent used.-   937 peptide mass(es) shared by 1 protein(s)-   149 peptide mass(es) shared by 2 protein(s)-   34 peptide mass(es) shared by 3 protein(s)-   17 peptide mass(es) shared by 4 protein(s)-   6 peptide mass(es) shared by 5 protein(s)-   8 peptide mass(es) shared by 6 protein(s)-   2 peptide mass(es) shared by 7 protein(s)-   4 peptide mass(es) shared by 8 protein(s)-   2 peptide mass(es) shared by 10 protein(s)-   1 peptide mass(es) shared by 11 protein(s)-   1 peptide mass(es) shared by 12 protein(s)-   1 peptide mass(es) shared by 13 protein(s)-   1 peptide mass(es) shared by 17 protein(s)-   1 peptide mass(es) shared by 19 protein(s)-   1 peptide mass(es) shared by 23 protein(s)-   1 peptide mass(es) shared by 33 protein(s)-   1 peptide mass(es) shared by 205 protein(s)

This peptide mass fingerprint identifies 49.8% of proteins uniquely onthe basis of their terminal peptide mass alone.

Example 3 C-terminal capture and cleavage with cyanogen bromide

In this example the 1882H. influenzae proteins are immobilisednon-covalently onto a solid phase support. The proteins are treated withtrifluoroacetic anhydride. This reagent will cap the reactive sidechains of lysine, serine, threonine and tyrosine to give trifluoroacetylderivatives. This reagent will also activate side chain and terminalcarboxyl groups. The C-terminal activated carboxyls spontaneously forman oxazolone. The unreacted trifluoracetic anhydride is washed away. Theactivated side chains are then reacted with piperidine in the presenceof a non-aqueous base which will generate piperidine amides at theactivated side chain carboxyls. The C-terminal oxazolone is thenhydrolysed back to the original free carboxyl which is then modified topermit capture onto a solid phase support, e.g. by biotinylation. Themodified proteins are then cleaved with cyanogen bromide which cleavesat methionine residues. The resultant peptides are washed in an inertsolvent and released from the solid phase support on which the chemistryis performed. The ° C-terminal peptides are selectively biotinylatedpermitting them to be captured onto an avidinated support. This allowsnon-terminal peptides to be washed away. The C-terminal peptides arethen released from the solid support and analysed by mass spectrometry.

Results

-   1882 protein records analysed.-   18 proteins did not have a cleavage site for the reagent used.-   1461 peptide mass(es) shared by 1 protein(s)-   157 peptide mass(es) shared by 2 protein(s)-   21 peptide mass(es) shared by 3 protein(s)-   4 peptide mass(es) shared by 4 protein(s)-   2 peptide mass(es) shared by 5 protein(s)-   3 peptide mass(es) shared by 6 protein(s)

This peptide mass fingerprint identifies 77.6% of proteins uniquely onthe basis of their terminal peptide mass alone.

Example 4 C-terminal capture and cleavage with BNPS-skatole

In this example the 1882H. influenzae proteins are treated as in example3 except that after the C-terminal carboxyl has been modified to permitcapture onto a solid phase support the proteins are cleaved withBNPS-skatole (3-bromo-3-methyl-2-(o-nitrophenylsulphenyl)indolenine)which cleaves proteins chemically at tryptophan residues rather thancyanogen bromide. The resultant peptides are then desorbed from theirsolid phase support and incubated with an avidinated support. Thisallows non-terminal peptides to be washed away. The C-terminal peptidesare then released from the solid support and analysed by massspectrometry. In this example, therefore, the reactive side chains oflysine, serine, threonine and tyrosine are trifluoroacetyl derivativesagain and side chain carboxyls are piperidine amide derivatives.

Results

-   1882 protein records analysed.-   361 proteins did not have a cleavage site for the reagent used.-   1484 peptide mass(es) shared by 1 protein(s)-   163 peptide mass(es) shared by 2 protein(s)-   15 peptide mass(es) shared by 3 protein(s)-   5 peptide mass(es) shared by 4 protein(s)-   1 peptide mass(es) shared by 7 protein(s)

This peptide mass fingerprint identifies 78.9% of proteins uniquely onthe basis of their terminal peptide mass alone.

Example 5 Combining two profiles

The list of unique proteins generated in example 3 where cleavageCyanogen Bromide was used and in the example 4 where cleavage byBNPS-skatole was used were combined to determine the sum of proteinsthat were identified uniquely on the basis of their mass alone betweenthe two profiles.

1774 proteins are uniquely resolved between the two profiles. Thisamounts to resolving 94.3% of the H. influenzae proteins in theSWISSPROT database uniquely on the basis of terminal peptide massesalone. This is a 19.5% improvement in resolution over the betterindividual profile.

1. A method for characterising a populatin of parent polypeptides in asample, which method comprises: (a) contacting a first portion of thesample with a first sequence-specific cleavage agent to generatepolypeptide fragments; (b) isolating one or more polypeptide fragments,each fragment comprising the N-terminus or the C-terminus of the parentpolypeptide from which it was fragmented; (c) identifying the isolatedfragments by mass spectrometry; (d) repeating steps (a)-(c) on a secondportion of the sample using a second sequence-specific cleavage agentthat cleaves at a different site from the first cleavage agent, whereinthe second cleavage agent is different from the first cleavage agent,and wherein the second portion of the sample is separate from the firstportion of the sample; and (e) characterising the parent polypeptides inthe sample from the fragments identified in steps (c) and (d).
 2. Amethod according to claim 1, wherein the step (d) comprises repeatingsteps (a)-(c) two or more times, each time using a portion of samplewhich is separate from the previous portions, and each time using afurther cleavage agent that cleaves at a different site from theprevious cleavage agents.
 3. A method according to claim 1 or claim 2,comprising a further capping step prior to step (a), which capping stepcomprises reacting the parent polypeptides in the portion of sample withone or more capping agents to introduce capping groups on one or morereactive side chains of the polypeptides.
 4. A method according to claim3, wherein the capping step and steps (a)-(c) are repeated one, two, ormore times, each time using a portion of sample which is separate fromthe previous portions, and each time introducing capping groups at thesame side chains as the previous capping steps, but using capping groupshaving different mass than the corresponding capping groups used in theprevious capping steps.
 5. A method for characterising a parentpolypeptide or a population of parent polypeptides in the sample, whichmethod comprises: (f) contacting a first portion of the samplecomprising one or more polypeptides with a first capping agent in afirst capping step to introduce capping groups on one or more reactiveside chains of the polypeptides; (g) contacting the first sample with asequence-specific cleavage agent to generate polypeptide fragments; (h)isolating one or more polypeptide fragments, each fragment comprisingthe N-terminus or the C-terminus of the parent polypeptide from which itwas fragmented; (j) identifying the isolated fragments by massspectrometry; (k) repeating steps (f)-(j) on a second portion of thesample using a second capping groups that introduces capping groups atthe same side chains as the first capping step, but uses capping groupshaving different mass than the capping groups used in the first cappingstep, wherein the second portion is separate from the first portion; and(l) characterising the one or more parent polypeptides in the samplefrom the fragments identified in steps (j) and (k).
 6. A methodaccording to claim 5, wherein the steps (f)-(i) are repeated two or moretimes, each time using a portion of sample which is separate from theprevious portions, and each time introducing capping groups at the sameside chains as the previous capping steps, but using capping groupshaving different mass than the corresponding capping groups used in theprevious capping steps.
 7. A method according to claim 5 or claim 6,wherein the step (k) comprises repeating steps (f)-(j) one, two, or moretimes, each time using a portion of sample which is separate from thoprevious portions, and each time using a further cleavage agent thatcleaves at a different site from the previous cleavage agents.
 8. Amethod according to claim 5, wherein the side chains to be cappedcomprise one or more of the following: the NH₂ side chain in arginine;the NH₂ side chain in asparagine; the NH₂ side chain in glutamine; theN₂ side chain in lysine; the COOH side chain in aspartic acid; the COOHside chain in glutamic acid; the OH side chain in serine; the OH sidechain in threonine; the OH side chain in thyroxine; the OH side chain intyrosine; and the SH side chain in cysteine.
 9. A method according toclaim 1, wherein the fragments are isolated by capture on a solid phase,such as isothiocyanato glass (or DITC glass) or polystyreneisothiocyanate.
 10. A method according to claim 9, wherein the captureinvolves covalently bonding the fragments to the solid phase.
 11. Amethod according to claim 10, wherein the fragments are bound to thesolid phase through their N-termini.
 12. A method according to claim 1,wherein each isolated fragment comprises the C-terminus of the parentpolypeptide from which it was fragmented.
 13. A method according toclaim 1, wherein the cleavage agent employed comprises an endopeptidaseor a chemical cleavage agent.
 14. The method of claim 13, wherein thecleavage agent is at least one compound selected from the groupconsisting of a Lys-C endopeptidase, a thiocyanate compound, cyanogenbromide, 3-bromo-3-methyl-2(o-nitrophenyl sulphenyl) indolenine-skatole,trypsin, chymotrypsin and/or thrombin.
 15. The method of claim 5 thereinthe capping agent is at least one compound selected from the groupconsisting of an iodacetate compound, an isocyanate compound, a silylcompound, an anhydride, a vinylsulphone compound and a vinyl pyridinederivative.
 16. The method of claim 1 which method further comprisescharacterizing the one or more parent polypeptides in the sample bycomparison with a database based on terminal peptide mass.
 17. A methodfor detecting the expression of one or more proteins in a tissue, whichmethod comprises characterizing a population of parent polypeptides in asample of tissue comprising the following steps: (a) contacting a firstportion of the sample comprising one or more parent polypeptides with afirst sequence-specific cleavage agent to generate polypeptidefragments; (b) isolating one or more polypeptide fragments, eachfragment comprising the N-terminus or the C-terminus of the parentpolypeptide from which it was fragmented; (c) identifying the isolatedfragments by mass spectrometry; (d) repeating steps (a)-(c) on a secondportion of the sample using a second sequence-specific cleavage agentthat cleaves at a different site from the first cleavage agent, whereinthe second cleavage agent is different from the first cleavage agent,and wherein the second portion of the sample is separate from the firstportion of the sample; and (e) characterizing the one or more parentpolypeptides in the sample from the fragments identified in steps (c)and (d).
 18. The method of claim 17 which further comprise chracterizingthe population of parent polypeptide by comparison with a database basedon terminal peptide mass.
 19. A method for assaying for one or morespecific polypeptides in a sample, which method comprises characterizinga population of parent polypeptides comprising the following steps: (a)contacting a first portion of the sample comprising one or more parentpolypeptides with a first sequence-specific cleavage agent to generatepolypeptide fragments; (b) isolating ore or more polypeptide fragments,each fragment comprising the N-terminus or the C-terminus of the parentpolypeptide from which it was fragmented; (c) identifying the isolatedfragments by mass spectrometry; (d) repeating steps (a)-(c) on a secondportion of the sample using a second sequence-specific cleavage agentthat cleaves at a different site from the first cleavage agent, whereinthe second cleavage agent is different from the first cleavage agent,and wherein the second portion of the sample is separate from the firstportion of sample; and (e) characterizing the one or more parentpolypeptides in the sample from the fragments identified in steps (c)and (d); and (f) determining the presence or absence of said one or morespecific polypeptides based on the presence or absence of one or morespecific fragments corresponding to said polypeptides.
 20. The method ofclaim 19, which further comprises characterizing said population ofparent polypeptides by determining the presence or absence of said oneor more specific polypeptides by comparison to a database based onterminal peptide mass.